Recent Developments within BulTreeBank

نویسندگان

  • Petya Osenova
  • Kiril Simov
چکیده

The paper discusses recent developments in BulTreeBank (BTB). First of all, these developments include the preparatory steps for transferring richer linguistic knowledge from the original BTB into BTB-UD in order to for the enhanced dependencies to be added in the next release in May 2018. The new line of research also handles the extension of the BTB valency lexicon with subatom-based embeddings for English. The aim is to check automatically how good they are for detecting the core participants in an event. Since there are not enough resources for Bulgarian, we rely on transferring the embeddings trained on English data but enhanced with mappings to the Bulgarian WordNet and evaluated over BTB as gold standard.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recent Developments in Discrete Event Systems

This article is a brief exposure of the process approach to a newly emerging area called "discrete event systems" in control theory and summarizes some of the recent developments in this area. Discrete event systems is an area of research that is developing within the interstices of computer, control and communication sciences. The basic direction of research addresses issues in the analysis an...

متن کامل

Event Ordering. Temporal Annotation on Top of the BulTreeBank

This paper describes the preliminary work on the project of extending the BulTreeBank with temporal information that will serve as a golden standard for Bulgarian language. We outline a flexible markup scheme that is based on a language-specific verb taxonomy and test its capabilities by implementing algorithms for temporal entities recognition in the CLaRK System tool.

متن کامل

A Data-Driven Dependency Parser for Bulgarian

One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first robust data-driven parser for Bulgarian, trained and evaluated on data from BulTreeBank (Simov et al., 2002). The parser uses dependency-based representa...

متن کامل

Segmentation Layers in the Group of the Predicate: a Case Study of Bulgarian within the BulTreeBank Framework∗

This paper describes the development of a regular grammar that automatically recognizes and delimits segments in the group of the predicate in sentences of Bulgarian. The language-specific segmentation is performed at the level of partial parsing where reliable, meaningful and useful entities are formed called chunks. The significance of the grammar development lies in the fact that it is a plu...

متن کامل

Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank

In the field of Human Language Technology (HLT), the existence of linguistically interpreted real-world texts provides the license necessary for a given language to enter the area of high-tech applications. The significance of BulTreeBank is the granting of an HLT license to a “less processed” language like Bulgarian which, until recently, has been formally modelled and processed mainly on the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018